Model Selection

Long video understanding

# Long video understanding

Eagle 2.5 is a cutting-edge vision-language model (VLM) designed for long-context multimodal learning, supporting the processing of video sequences up to 512 frames and high-resolution images.

Transformers Other

Llavaction 0.5B

LLaVAction is a multimodal large language model for action recognition, based on the Qwen2 language model, trained on the EPIC-KITCHENS-100-MQA dataset.

Transformers English

MLAdaptiveIntelligence

Timesformer Base Finetuned K600

TimeSformer is a video classification model based on spatio-temporal attention mechanisms, fine-tuned on the Kinetics-600 dataset.

Video Processing

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase